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We describe a method for reconstructing multi-scale entangled states from a small number of 
efnciently-implementable measurements and fast post-processing. The method only requires single- 
particle measurements and the total number of measurements is polynomial in the number of par- 
ticles. Data post-processing for state reconstruction uses standard tools, namely matrix diagonali- 
sation and conjugate gradient method, and scales polynomially with the number of particles. Our 
method prevents the build-up of errors from both numerical and experimental imperfections. 



I. INTRODUCTION 

Quantum state tomography [1J is a method to learn a 
quantum state from measurements performed on many 
identically prepared systems. This task is crucial not 
only to assess the degree of control exhibited during the 
preparation and transformation of quantum states, but 
also in comparing theoretical predictions to real-life sys- 
tems. For instance, numerical methods are used to com- 
pute the ground states or thermal states of model quan- 
tum systems. Quantum state tomography could be used 
to check that the experimental state corresponds to the 
predicted one, thus providing an essential link between 
theory and experiments. For example, one could in prin- 
ciple use tomography to settle the question [1] of which 
states correctly describe the quantum Hall fluid at vari- 
ous filling parameters. 

In practice however, the state of n particles is described 
by a number of parameters that scales exponentially with 
n. Therefore, tomography requires an exponential num- 
ber of identically prepared systems on which to perform 
exponentially many measurements needed to span a ba- 
sis of observables that completely characterizes the state. 
Furthermore, solving the inference problem to determine 
the quantum state that is compatible with all these mea- 
surement outcomes requires an exponential amount of 
classical post-processing. These factors limit tomogra- 
phy to at most a few tens of particles. 

While this is unavoidable for a generic state, many 
states encountered in nature have special properties that 
could be exploited to simplify the task of tomography. 
In fact, the overwhelming majority of tomographic ex- 
periments performed to date [3H5] were used to learn 
state described with only a few parameters. Such varia- 
tional states — family of states specified with only a few 
parameters — are widespread in many-body physics be- 
cause they are tailored for numerical calculations and can 
predict many phenomena observed in nature (Kondo ef- 
fect, superconductivity, fractional statistics, etc). One 
example, familiar to the quantum information commu- 
nity, is matrix product states (MPS) |10ffT5] that are at 
the heart of the density matrix renormalisation group 
(DMRG) numerical method, suitable for the description 
of one-dimensional quantum systems with finite correla- 
tion length [13] . 

Recently, we and others have demonstrated [TS] that 



tomography can be performed efficiently on MPS, i.e., 
such states can be learned from a small number of sim- 
ple measurements and efficient classical post-processing. 
Here, we take this result one step further and demon- 
strate that it is possible to efficiently learn the states asso- 
ciated to the multi-scale renormalization ansatz (MERA) 
introduced by Vidal |16| . for which efficient numerical al- 
gorithms to minimize the energy of local Hamiltonians 
exist [IT]. As opposed to MPS, these MERA states are 
not restricted to one dimension and can describe sys- 
tems with long-range correlation. This last distinction 
is important because one of the most interesting phe- 
nomena in physics, quantum phase transitions, leads to 
a diverging correlation length and arc therefore not suit- 
ably described by MPS. In contrast, MERA have been 
successfully used to study numerous many-body models, 
such as the critical Ising model in ID [17] and 2D |18| . 
and can also accurately describe systems with topological 
order [MSI]. 

In this work, we present two related methods to learn 
the one-dimensional MERA description of a state us- 
ing tomographic data obtained from local measurements 
performed on several copies of the states. Our learn- 
ing methods for MERA are based on the identification 
of the unitary gates in the quantum circuit that out- 
puts the MERA state. In that regard, this Article is 
a continuation of our work on MPS and is reminiscent 
of early methods to numerically optimise MERA tensors 
|25_. However, going from MPS to MERA is non-trivial 
because MERA exhibits a spatial arrangement of gates 
that is more elaborate. Since MERA is a powerful numer- 
ical tool, our learning method bridges the gap between 
numerical simulations and experiments by allowing the 
direct comparison of numerical predictions to experimen- 
tal states. 

The first method we present requires unitary control 
of the system and the ability to perform tomography on 
blocks of a few particles, which can be realized using the 
correlations between single-particle measurements. Cru- 
cially, the size of those blocks does not depend on the 
total size of the system, making it a scalable method. In 
an experiment, one cannot know beforehand if the state 
in the lab is a MERA. However, our method contains a 
built-in certification procedure from which one can assess 
the proper functioning of the method as the experiments 
are performed and conclusively determine if the state is 
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well described by the MERA. The second method builds 
on the first one, but completely circumvents the need for 
unitary control. Thus, this MERA learning method can 
be implemented with existing technologies. The draw- 
back of this simplified method is that it does not come 
with a built-in certification procedure. Certification in 
this case can be realized using the Monte Carlo scheme 
|26| . which requires the same experimental toolbox. 

The rest of the paper is organised as follows. We first 
present the proposed method for MERA learning in sec- 
tion|TTJ Subsection |II A| explains how to identify the disen- 
tanglers. We start by deriving a necessary condition for 
the existence of suitable disentangler and then turn this 
criterion into a heuristic objective function that we min- 
imize numerically over unitary space. In subsection |II B[ 
we carefully analyze the buildup of errors in our proce- 
dure and show that errors only accumulate linearly with 
the size of the system. In subsection |II C[ we present 
numerical benchmarks of our tomography method. In 
subsection, |IID| we present the simplified method by 
demonstrating that it is not necessary to apply the dis- 
entanglers to the experimental state since we can simu- 
late the effect of those disentanglers numerically, albeit 
at the cost of more repeated measurements and a slightly 
worse error scaling (analyzed in appendix [A|. In section 
III| we discuss potential issues for our numerical scheme 
and suggest modifications to prevent them. Finally, we 
present in appendix [S] a tool to contract two different 
MERA states, which allows for the efficient comparison 
of a MERA whose parameters have been identified ex- 
perimentally using our method to a predicted theoretical 
MERA state. 



II. MERA LEARNING 

A. Identifying the disentanglers 

MERA states can be described as the output of a quan- 
tum circuit [H] whose structure is represented on Fig. [I] 
(as seen with inputs on the top and output at the bot- 
tom). For simplicity, we will focus on a one dimensional 
binary MERA circuit for qubits, but our method gener- 
alizes to all ID MERA states, i.e., particles could have 
more internal states thus accounting for a larger MERA 
refinement parameter \ and isometries could renormal- 
ize several particles to one effective particle. The circuit 
contains three classes of unitaries. Disentanglers (repre- 
sented as □) are two-qubit unitary gates; isometries (rep- 
resented as A) are also two-qubit gates but with with one 
input qubit always in the |0) state; the top tensor (rep- 
resented as O) is a special case of isometry that takes as 
input two qubits in the 1 00) state. Each renormalisation 
layer is made of a row of disentanglers and a row of isome- 
tries. Disentanglers remove the short-scale entanglement 
between adjacent blocks of two qubits while isometries 
renormalise each pair of qubits to a single qubit. Each 
renormalisation layer performs theses operations on a dif- 
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Figure 1: The optimal disentangler U can be computed from 
the tomographic estimation of the density matrix P123 on the 
first three qubits. Once applied, the resulting state pi2 [U] is 
very close to a rank 2 matrix. Thus, there exist a unitary V 
such that p~i2[U] can be rotated such that the first qubit is 
almost in the state |0). 



ferent lengthscale. The quantum circuit thus mirrors the 
renormalisation procedure that underlies the MERA. 

Learning a MERA amounts to identifying the vari- 
ous gates in that circuit, which turns the experimental 
state into the all |0) state. The intuitive idea behind our 
scheme is to proceed by varying the isometries and dis- 
entanglers until the "ancillary" qubits reach the state |0) 
for each row of isometries. We will exploit this feature to 
numerically determine each disentangler. 



1. Necessary condition for disentangler 

Consider the n qubits at the lowest layer of the MERA. 
Let P123 be the reduced density matrix on the three first 
qubits (see Fig. [TJ. If the state is exactly a MERA, there 
exists a unitary U23 acting on qubits 2 and 3 (see left of 
Fig. [IJ such that applying this unitary and tracing out 
the 3rd qubit yields a density matrix 



P12 [U; 
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of rank at most 2. Indeed, if the rank was strictly greater 
than 2, it would be impossible for the isometry V (see 
left of Fig. [IJ to map the density matrix p 12 [E/23] to a 
state with one of the qubit in the state |0) because the 
dimension of the space |0) <g) C 2 would be strictly smaller 
than the dimension of the support of the density matrix. 
Hence, we have the necessary criterion 
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has rank less or equal than 2. (2) 



To find a unitary that fulfills this criterion, it is nec- 
essary to know the state /O123, and this can be achieved 
by brute- force tomography on these three qubits. Once 
the original state on the three qubits is known, one has to 
perform a search over the space of unitaries to find a suit- 
able disentangler. To do this, we will define an objective 
function to minimise numerically. 

Once this optimal unitary operator U has been found 
numerically, it is necessary to consider how it modifies the 



quantum state before learning the other elements of the 
circuit. One obvious way to do so is to apply the unitary 
transformation to the experimental state and continue 
the procedure on the transformed state. This amounts 
to undoing the circuit, and should in the end map the 
experimental state to the all |0) state. For simplicity 
we will first present our scheme assuming that the state 
is transformed at every step this way. Of course, such 
unitary control increases the complexity of the scheme 
and could be out of the reach of current technologies. 
However, in section [lID[ we will explain how this unitary 
transformation can be circumvented at the cost of a slight 
increase in the number of measurements. 

After the optimal disentangler U has been applied to 
the state, we need to identify the unitary V that rotates 
the density matrix on the first two qubits such that the 
first qubit is brought to the |0) state, c.f. Fig [I] left. 
This does not require any additional tomographic esti- 
mate since we already know the descriptions of the state 
on the three first qubits and the disentangler. We can 
thus compute the state on the first two qubits pi2\U] and 
diagonalise it to obtain the eigenvectors corresponding to 
its two non-zero eigenvalues. The unitary V is chosen to 
map those two eigenvectors to the space |0) £S)C 2 , i.e., V 
rotates the qubits such that the support of the density 
matrix is mapped to a space where the first qubit is in 
the |0) state. 

All other disentanglers of this layer can be found by 
recursing the above procedure. Once a disentangler has 
been identified, it is physically applied to the system 
and brute-force tomogrography is performed on the next 
block of three qubits. 

Notice that for the last block, a single unitary is re- 
sponsible for minimising the rank of two density matrices, 
Pn-3,n-2 and Pn—^n- One possible way to handle this is 
to get a tomographic estimate of the state on the last 
four qubits and to try to minimise the rank of both re- 
duced matrices. Another way, for which we have opted in 
our numerical simulations, is to perform multiple sweeps 
over the layer. For instance, the disentanglers will first be 
identified from left to right and then the next sweep will 
be performed from right to left, using the disentanglers 
found in the first sweep as initial guesses in the space of 
unitaries (see Fig. [2]). The number of sweeps can be in- 
creased for better accuracy but each additional sweep re- 
quires to extract the tomographic estimates again. Mul- 
tiple sweeps would also allow to apply our method to 
MERA states with periodic boundary conditions in ID 
and could be useful for 2D-MERA states. While this 
would be an interesting continuation of our work, we fo- 
cus on 1D-MERA for the rest of the article. 



2. Heuristic objective functions 

One of the steps in our protocol consists in identify- 
ing the unitary U that minimizes the rank of pi2[U], c.f. 
eq. (HI. There are many distinct ways this can be done 
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Figure 2: Identification of the disentanglers using two succes- 
sive sweeps of the chain. Dotted regions cover particles on 
which brute-force tomography is performed. The first sweep 
(red dotted regions) finds unitaries starting from the left end 
of the chain. Those unitaries will be used as initial guesses 
for the second sweep (blue striped regions) that starts from 
the right end of the chain. 

and in this section, we present a practical heuristic to 
accomplish this task. Minimising the rank of the density 
matrix px2 [U] is not a suitable numerical task because, 
even if the experimental state is an exact MERA, the 
inferred density matrix will typically have full rank due 
to machine precision and the imperfect tomographic esti- 
mation of /J123. Thus, we turn the problem of finding U23 
into an optimization problem by considering the eigende- 
composition of/512 [U] — J2k Afc|V'fc)(V'fc| where the eigen- 
values are sorted in decreasing order Ai > A2 > A3 > A4. 
If pi2[f] has most of its support on a two-dimensional 
space, it will have two small eigenvalues that are typi- 
cally non-zero due to imperfections. We thus consider 
the objective function 

/(Cf,Pia3) = I>* (3) 

fc>2 

and we perform a minimisation over the space of uni- 
taries to determine the optimal unitary U. This objec- 
tive function has a well-defined operational meaning - it 
is the probability of measuring the disentangled qubit in 
the |1) state after the isometry V has been applied. We 
will see in section |IIB| that this property can be used 
to certify the distance between the experimental and the 
reconstructed states. 

Another way to think about this objective function is 
to consider the characteristic polynomial P[X] of P12 [U] 
which is of the form 

P[X] = A 4 - X 3 + aX 2 -bX + c (4) 

where the coefficients a, b and c are positive since they 
correspond to the sum of product of the positive eigen- 
values of the density matrix. In particular, coefficient 
b is the sum of all products of three eigenvalues, i.e., 
b = A1A2A3 + Ai A2A4 + A1A3A4 + A2A3A4. In order for the 
rank of the density matrix to be 2, it is sufficient for all 
4 products of three eigenvalues to vanish, i.e, 

P12 [U] of rank less than 2 <^=> 6 = 0. (5) 
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Thus, another suitable objective function is the posi- 
tive coefficient b. which is a polynomial in the entries of 
P12 [U] . Indeed, using Bocher formula, coefficient b can be 
expressed as 66 = 1 - 3TrA 2 + 2TtA 3 where A = p 12 [U]. 
Thus, b is a well-behaved function with respect to the 
density matrix. Note also that b can be expressed without 
diagonalising the density matrix p\2[U\. We will focus on 
minimising (J3j) in all subsequent numerical discussion and 
results. 



3. Numerical minimisation over unitary space 

Minimisations of ^ is performed using a conjugate 
gradient method. We first have to account for the fact 
that the unitary manifold is not a vector space. To get 
around this problem, we go to the Hermitian space by 
writing any unitary U as the result of a Hamiltonian 
evolution, i.e., there exists a Hermitian matrix H such 



that U = 



iH 



It is then possible to use the standard 



conjugate gradient method. Let us sketch the algorithm 
in more details. 

First, we select a unitary Uq either at random or from 
an initial guess (provided for instance by a previous 
sweep). We will search the unitary space by generat- 
ing a sequence of unitaries {£4}. At the k th step of the 
minimisation, the algorithm is the following. 

1. We center the unitary space at point Uk-i by defin- 
ing Pk = (I ® [4-i)p/c_i(I <8> E/fc_i)t. 

2. We compute the gradient G^ by parametrizing 
the Hamiltonian H on 2 qubits by its decomposi- 
tion on the Pauli group H = Y]^,, h^a^^a^ where 
°"u € {I, o x i &y ■, &z } is a Pauli matrix. We succes- 
sively evaluate the component of the gradient G^ 
in the direction (/i, v) by looking at the effect of the 
test unitary U^. v = I + iec^ <X> o~ v on the objective 



function. 



G 



a small number. 



(fc) 

ft, V 



/(C/ M ,„,p fc )-/(I, p k ) 



where e is 



3. Instead of following the gradient, which would gen- 
erally undo some of the minimization performed 
in the previous steps, we use a conjugate gra- 
dient method where the new direction of search 
is optimized by taking into account the direc- 
tion used in the previous step G^" -1 ) through the 
Polak-Ribii'^cere formula. More precisely, & k ' = 
G^ + fi& k ~ is> in which the real parameter j3 is 



defined as f3 = max 



I U ' 0(k-l).Q(k-l) J' 



4. We perform a line search along the direc- 
tion & k ^ by considering the family of unitaries 

exp C— itY^n v G^.uU^ ® cr^j and optimizing the 



parameter t to find t op t- We then define 



U k = cxp -it opt ^2 G^uC 



U, 



k-l 



(G) 



which ends the k th iteration. 

We iterate until the objective function is close enough to 
zero or that improvement has stopped. 

This method is heuristic since the objective functions 
present no characteristic that would ensure the conver- 
gence of the conjugate gradient method. In particular, 
our search over unitary space depends on the starting 
point, i.e., the unitary chosen in the first iteration. In- 
deed, some starting points will lead the heuristic to a lo- 
cal minima where it will get stuck. In order to avoid this 
phenomenon, we can repeat the overall search by pick- 
ing at random (according to the unitary Haar measure) 
different initial points which lead to potentially different 
minima and keep the smallest of those minima, which we 
expect to be the global minimum. In any case, this is a 
minimization problem over a spapce of constant dimen- 
sion, so the method used to solve it does not affect the 
scaling with the number of particles n. Ultimately, we 
can always use a finite mesh over the unitary space and 
use brute- force search. Nevertheless, we found numeri- 
cally that this heuristic works well. 

It is also possible that a choice of unitary that is op- 
timal locally, in the sense that it minimizes |3 1, is not 
optimal globally as it might lead to a state for which it is 
impossible to find a disentangler obeying (|3| elsewhere in 
the circuit. This is a phenomenon that is more likely to 
occur when the minimum is degenerate, i.e., there exists 
several distinct (modulo gauge) exact disentanglers for 
the state. However, we have performed numerical exper- 
iments on randomly generated MERA states as well as 
physically motivated states and found that the conjugate 
gradient performs well (see section II C I . 



B. Error analysis 

In practice, due to numerical and experimental imper- 
fections, the disentangled qubits will not be exactly in 
the |0) state, but merely close to it. This situation arises 
from the conjunction of three causes : i) the experimental 
state of the system is not exactly a MERA, but merely 
close to one, ii) the tomographic estimate of the density 
matrices on blocks of three qubits are slightly inaccurate 
due to noisy measurements and experimental finite pre- 
cision, Hi) the numerical minimization did not find the 
exact minimum. 



1. Isolating each elementary steps to prevent error buildup 

Our error analysis will show that the buildup of errors 
is linear in the number of disentanglers of the MERA 
circuit which is itself linearly proportional to the num- 
ber of particles in the experimental state. Essentially, 
the distance between the reconstructed state and the ex- 
perimental state is the sum of the error made at each 
elementary step when estimating a disentangler and an 
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isometry. Fortunately, the error made at each elemen- 
tary step does not depend on the errors made at previous 
steps. The key to isolate each step from the others is to 
measure the qubit that should have been disentangled in 
the computational basis. With high probability the qubit 
will be found in the |0) state. While the probability of 
measuring the |0) outcome depends on previous errors, 
the post-selected state is now free from previous errors. 
The interest of this postselection is two-fold. First, it for- 
bids errors in previous steps to contaminate the state and 
amplify the error made at the current step, thus limiting 
the error propagation. Second, by accumulating statis- 
tics on this measurement, we can estimate the probability 
of the all-0 outcome and use it to bound the distance of 
the reconstructed state to the actual state in the lab. 
Therefore, our procedure comes with a built-in certifica- 
tion process. Let us now describe the error analysis in 
more details. 



Applying this disentangler and isometry to the whole 
state of the chain, one gets 

Ffe+it/fe+i . . . VxUx V>) = cm — — (10) 

where the accumulated error vector at step k + 1 is 

|e*+i) = kfc+i) + y/l + e k+1 V k+1 U k+1 \et m ) (11) 

and the square if its norm e| r J 1 = (ejSjejSi) obeys the 
recurrence relation 1 + ef^ 1 = (1 + e k +i) (1 + e£ m ) since 
the elementary error vector \e k +i), for which all previous 
ancillary particles have been disentangled, is orthogonal 
to the vector Vk+iUk+i\e k m ) ■ Thus, 

fe+i 

l + £fi = II (! + *)■ (12) 
t=i 



2. Error at each elementary step 

Recall the notation of Fig. [T] Due to numerical and 
experimental imperfections, the state on qubits 1, 2 and 
3 after applying the disentangler U\ and the isometry V\ 
is not exactly in the |0) (g)C 2 ^™ -1 ^ subspace but contains 
a small component orthogonal to that space. Thus, it 
has the form 



v/l + (ei|ei) 



(7) 



where 1 771 ) is the normalised pure state on qubits 2 to n if 
qubit 1 had been completely disentangled from the chain 
and I ei ) is some subnormalized vector supported on the 
subspace |1) ® C 2 ^™^ 1 ^ . The isometry V\ is chosen to 
minimize the norm of |ei), i.e., to minimize t\ = (ei|ei). 

Further along the layer, the state after applying k dis- 
cntanglers and k isometries will be of the form 



Vullk.-.V^M) = 



\0)® k \Vk) + \el m ) 



(8) 



where the first term \0)® k \r] k ) is the normalised state had 
the k qubits in position 1, 3. . .2k— 3 been completely dis- 
entangled from the chain and |e£ m ) is the accumulated 
error vector orthogonal to the space where those k qubits 
are in the |0)® fe state. In order to find the optimal disen- 
tangler and isometry, we measure the last disentangled 
qubit in the computational basis and post-select on the 
|0) outcome, which occurs with probability (1 + e^" 1 ) -1 . 
We then perform brute force tomography and identify 
numerically the disentangler and the isometry that min- 
imimizes the norm of the error vector \e k +i) such that 



V k+ iU k +i\Vk) 



\r) k +i) + [gfc+i) 
71 + e k +i 



(9) 



3. Global error 

After the choice of m disentanglers and m 
isometries, the reconstructed state is \<p) = 
VXUl % ...V^jj\\Q)'^ m+1 \r lm ). Its difference to the 
actual experimental state \ip) can be stated in terms of 
the (in) fidelity as 

i-|W#)l 2 = i-V(i + C) 

m 

= l-l/IJl + ei. (13) 

i=l 

Practically, one is interested in guaranteeing that the 
reconstructed state is close to the experimental up to 
global error E, i.e., to guarantee that 1 — |(</>|?/>)| 2 < E. 
Suppose that all error vectors are bounded, i.e. that for 
all step i, we have ti<£. Inverting (13), it suffices that 



e<(l-E) 



-l/m 



1 ~ E/m 



in the limit where the tolerable global error E is small. 
Thus, we see that errors accumulate linearly and that a 
precision inversely proportionnal to the number of dis- 
entanglers is sufficient to ensure a constant global error. 
Furthermore, statistics on the post-selection performed 
at each step allows to estimate each e£ m and in particu- 
lar e^ 1 that gives direct access to the distance betweem 
the reconstructed and experimental states. 

Finally, from this measurement data, one can estimate 
the error tj performed at each step in order to identify 
steps that have gone wrong. This information can be 
used to turn the scheme into an adaptative one. Sup- 
pose the error is particularly large for a given step. This 
might be due to an important amount of entanglement 
concentrated in one region of space, e.g. near a defect, 
which can be accounted for by increasing the MERA re- 
finement parameter x locally, i.e. by using disentanglers 
acting on a larger number of qubits. In principle, x could 
be increased until the error is below some threshold. 
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Figure 3: (top) Infidelity to the "experimental state", i.e, 1 — 
\{i>tomo\ip)\ 2 where is a random MERA on n qubits and 
\4>tomo) is the state reconstructed from the MERA learning 
method using three sweeps, (bottom) Processing time (on 
a standard laptop) to perform MERA learning using three 
sweeps. Both figures exhibit 10 runs for each number of qubits 
n £ {8, 12, 16, 20, 24}. In both figures, each x represents 
results for one random MERA. The full lines represent median 
for each number of qubits. The dashed line on the top figure 
is the linear approximation to the median. Notice that the 
numerical minimisation can fail to converge as illustrated by 
the atypical data points. For instance, for one of the 20- 
qubit MERA, the processing time was 338.3 seconds and the 
infidelity to the true state is large, 1 — \{iptomo\i>)\ 2 = 3.916 x 
10" 3 . 
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Figure 4: Infidelity to a 20 qubit state using a reconstructed 
method with a variable number of sweeps. Each line corre- 
sponds to a different random MERA. 



the conjugate gradient descents, and therefore scales lin- 
early with the number of disentanglers or the number of 
particles in the system. Figure ([3j bottom) shows the ac- 
tual runtime of the inference algorithm for different ran- 
domly chosen MERA states and of various sizes. Once 
again, we see a good agreement with a linear dependence 
with the system size. Systems of up to 24 qubits can eas- 
ily be handled in a few minutes of computation and re- 
quires 1197 different measurement settings for each sweep 
of the 24 qubit system. This is to be contrasted with the 
656,100 experiments needed to reconstruct the state of 
8 qubits in [3] and the post-processing of the data that 
took approximately a week [57] . Additional sweeps allow 
the method to converge as showed on Fig. [4] 

We also tested how our method behaved on a physical 
model, namely the ID Ising model with transverse field 
at the critical point. The results obtained where coherent 
with what is expected from the approximation of the true 
ground state with a MERA with refinement parameter 
X = 2. 



C. Numerical performance 

1. Benchmarking results 

We have performed numerical simulations to bench- 
mark the performances of the MERA learning method. 
We have generated random MERA states — by pick- 
ing each unitary gate in the circuit from the unitary 
group Haar measure — , simulated the experiment on 
those states, and use our algorithm to infer the initial 
MERA state. We did not introduce noise in measure- 
ments to simulate experimental errors since the error 
analysis indicates how those errors would build up. 

As mentioned before, there is no guarantee that our 
minimization procedure converges to the true minimum, 
resulting in small imperfections in the reconstructed 
state. Figure ([3j top) shows the distance between the 
reconstructed state and the actual state. As indicated 
by the dashed line, these results are in good agreement 
with a linear scaling of the error where the source of er- 
rors is due to finite machine precision and approximate 
minimisation of the objective function. 

The inference algorithm's complexity is dominated by 



2. Possible improvements 

Note the presence of isolated points on the graphs of 
Fig. [3] that achieve a lower fidelity and required a longer 
running time. These cases appear because the heuristic 
fails to find a global minimum. It appears that in some 
cases, a unitary transformation U23 meeting criterion ^ 
is not sufficient to guarantee that it will be possible to 
find subsequent disentanglers obeying (|3j. Put another 
way, locally minimising the objective function might not 
lead to a global optimum. Indeed, consider the following 
example. Let be a MERA state whose first qubit is 
disentangled from the rest of the chain, i.e. \tp) — \0)\4>). 
The rank of the density matrix on the first two qubits is at 
most 2 and that remains true after any unitary is applied 
on qubits 2 and 3. Thus, any choice of disentangler min- 
imises the objective function including the identity, 
i.e., applying no disentangler at all. However, suppose 
the state \(f>) on qubits 2 to n is highly entangled and 
that removing part of this entanglement between qubits 
2 and 3 was crucial to be able to reconstruct its MERA 
description. In this case, applying the identity on qubits 
2 and 3, even if locally optimal, was not globally opti- 
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mal. Hence, minimizing the objective function ^ seems 
to be necessary but not sufficient to successively identify 
all disent anglers. 

Although our numerical simulations suggest that this 
situation is rather atypical, it is possible to overcome this 
problem by imposing additional constraint on the disen- 
tangler. For instance, one can demand that the second 
qubit be in a state as pure as possible, effectively min- 
imizing the entanglement between the last qubit of one 
block and the first qubit of the next block. This corre- 
sponds to the following modified objective function 
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/(Pi2 [£/]) = $> fe + eA 2 



(14) 



fe>2 



i.e., we add a small perturbation that will only take ac- 
tion when the two smallest eigenvalues of pi2[&23] are 
very small and will further constrain the search. This 
slight modification solved the problematic situation we 
considered, and there exist many other heuristics to im- 
prove the method. 



D. Trading unitary control for repeated 
measurements 

For pedagogical reasons, we presented our learning 
method in a way that required disentanglers and isomc- 
tries to be physically applied to the experimental state 
in order to unravel the circuit. In this section, we will 
show how to circumvent unitary control at the price of 
slightly more elaborate numerical processing and con- 
suming more copies of the state. The main idea is to nu- 
merically simulate how measurements performed on the 
original, unaltered experimental system would be trans- 
formed if the unraveling circuit had been applied. 
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Figure 5: MERA as a renormalisation procedure that creates 
a sequence of states {p T } T - 
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Figure 6: Ascending superoperator and renormalized observ- 
ables for a ternary MERA. a) Ternary MERA with one site 
observable Oo that is transformed into a renormalized observ- 
able A(Oo) on the renormalized state, b) Tensor contraction 
corresponding to the ascending superoperator A. 



such that 



Tr[p T A(Or-i)] = Tt\pr-lO T -i]. 



(16) 



This recursively carries over to the experimental state p a 



Simulating measurements on renormalized state 



Tr[p T A T ' 



>Ai(O )]=Tr[p O }. 



(17) 



Recall that a MERA is an ansatz that corresponds 
to a renormalization procedure. Each renormalisation 
step maps a state to another one on fewer particles and 
schematically corresponds to a layer of the MERA circuit. 
Applying the first layer and removing the ancillary par- 
ticles that have been (approximately) disentangled maps 
the experimental state po on n particles to a state p± on 
fewer particles (see Fig |5| . Recursively, this procedure 
constructs a sequence of states {p T } T . 

To get from p r _i to p T , one can either perform this 
mapping physically by experimentally applying the gates 
corresponding to the MERA layer, or one can compute 
the function mapping p T _i to p T from the description of 
the gates. As in [T7], define a ascending superoperator A 
that maps an operator O r _i acting on layer r — 1 to the 
an operator O r acting on the next layer r 



T =A T (0 T - 1 ) 



(15) 



Thus, in order to extract information from a density ma- 
trix p T , one can measure the expectation value of sev- 
eral observables Oq on the density matrix pq. Measuring 
those observables will effectively amount to measuring 
the observables O l T = A T o • • • o Ai(O % ) on the density 
matrix p T . 

The ascending superoperator can be computed from 
the knowledge of the disentanglers and isometries. Its 
exact form depends on the physical support of the ob- 
servable. For instance, for ternary MERA, we can re- 
strict to ascending superoperator that only depends on 
the isometries of the MERA [22] (see Fig. [6]). This is a 
simple example where an experimental observable on one 
particle is mapped to observable on one particle. More 
generally, observables on many sites will be ascended to 
observables on fewer sites. Any choice of observables is 
valid as long as the renormalized observables { 0\ } span 
the support of the reduced density matrix p T . 
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2. Overhead in the number of measurements 

This procedure leads to a overhead in the total num- 
ber of measurements because renormalized observables 
are less efficient at extracting information. Suppose (for 
clarity) that we measure Pauli observables {Oq}^ on the 
experimental states. These observables are orthonormal 
for the Hilbert-Schmidt inner product and thus maxi- 
mize information extraction. However, the renormalized 
observables 0\ = Ai(O'q) need not be orthonormal. Con- 



sider their Gram matrix G, 



Tr 



O 



'(°Q 



which can 

be diagonalised by a unitary matrix Z. Its normalised 
eigenvectors R\ — -j== JT,j %ijO{ are orthornormal ob- 
servables but cannot be directly measured because they 
do not correspond to simple observables on the experi- 
mental state, but instead to linear combination of them. 
Thus, to reconstruct the density matrix p\ = '}2 i r\R\, 
the expectation values r\ = TrpiR\ have to be computed 
by taking a linear combination of the expectation values 
Oq = TtpqOq on the experimental state 



r = 



V A,; V Ai 



(18) 



Due to limited number of repeated measurements, es- 
timation of each o will present a variance V(o ). Sup- 
pose that measurements are repeated enough times to 
ensure that all variances are below a precision thresh- 
old, i.e., V(oq) < e. Since r\ is a linear combination 
of those measurements, it will have a variance V(r^) = 
^7 Y^j | 2 V(og) < e/Xi. Therefore, in order to ensure a 
precision e on the estimate of r\ , this imprecision needs to 
be compensated by multiplying the number of repeated 
measurements by the conditioning factor X^ 1 ■ 

When scaling operators on r layers, the conditioning 
factors for each layer will multiply (in the worst case) 
but we expect the conditioning for each layer to be a con- 
stant independant of system size. Thus, the total number 
of measurements will remain polynomial in the number 
of particles since there is only a logarithmic number of 
renormalisation layers. 

We can make this argument rigorous for critical sys- 
tems that exhibit scale-invariance, precisely the physical 
systems for which MERA was introduced. Due to scale- 
invariance, the ascending operator A T will not depend 
on the index of the layer and we refer to it as the scaling 
superoperator S [22]. Its diagonalization yields eigenvec- 
tors (j> a called scaling operators associated to eigenval- 
ues fi a . In [22 , it was shown that those eigenvalues are 
related to the scaling dimensions A Q of the underlying 
conformal field theory (CFT) by A a = log 3 p a where the 
basis of the log depends on the MERA type (here we con- 
sider a ternary MERA for clarity) . Scaling operators tfi a 
can be used as observables to extract information about 
states in higher level of the MERA. Indeed, one can sim- 
ulate a measurement of S T (4> a ) on p T by measuring the 
observable <f> a on po- We can analyze the increase in the 



number of measurements by distinguishing two sources 
of imprecision. First, to reconstruct p T one has to use 
normalised operator = 3 rA °6> r (0 Q ) whose increased 
statistical fluctuations have to be compensated by per- 
forming additional measurements. Second, diagonalising 
the Gram matrix of the will introduce another con- 
ditioning factor. However, this Gram matrix is indepen- 



Tr 



Tr [ 



dant of the layer since G a i 

Thus, the conditioning factor for layer r will be the prod- 
uct of a factor exponential in the number of layers and a 
constant factor coming from the orthonormalisation. 

This modified scheme circumvents the need of unitary 
control, but looses some of the features of the original 
scheme. First, because the system is not physically dis- 
entangled, we cannot certify directly the fidelity of the 
reconstruction. Second, as explained in appendix [X] the 
errors build up quadratically. 



III. DISCUSSION 

In this work, we have presented a tomography method 
that allows to efficiently learn the MERA description of 
a state by patching together tomography experiments on 
a few particles and using fast numerical processing. The 
method is heuristic but works very well in numerical sim- 
ulations. A complete analytical understanding of how to 
find an optimal disentangler at each step would be de- 
sirable, but may well be intractable. With regards to 
experimental use, the method should be tought of as a 
proof of principle and is flexible enough to be adapted to 
accomodate many experimental constraints. 

One issue of fundamental interest raised by our work 
is the relationship between the numerical tractability of 
a variational class of states and the ability to learn ef- 
ficiently the variational parameters. In order to be in- 
teresting, variational class of states must not only be de- 
scribed by a small number of parameters, but also allow 
for the efficient numerical computation of quantities of 
interest, such as the energy of the system, correlation 
functions, or more generally expectation values of local 
observables. On its own, an efficient representation is 
of limited computational usefulness. For instance, the 
Gibbs state or thermal state of a local Hamiltonian is 
described by a few parameters — a temperature and a 
local Hamiltonian — but does not allow to extract phys- 
ical quantities of interest efficiently. Another example is 
the variational class of projected entangled pair states or 
PEPS [28J , the generalization of MPS to system in more 
than one dimension. While PEPS have been instrumen- 
tal in better understanding of quantum many-body sys- 
tems, they are in general intractable numerically [29 . 

Is there a relation between numerical tractability and 
efficient tomography? The method presented in jT5] to 
learn a MPS from local measurements made explicit use 
of the energy minimization algorithm for MPS; namely 
DMRG [30]. This example suggests that numerical 
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tractability could imply that learning the variational pa- 
rameters is possible. In that regard, MERA are intrigu- 
ing states because they live at the frontier of tractabil- 
ity. Indeed, in more than 1 dimension, MERA states 
are a subclass of PEPS [3T with a bond dimension in- 
dependant of system size (32|. While the computation 
of expectation values of local observables is believed to 
be intractable for PEPS, it is efficient for MERA. In one 
dimension, MERA can be seen as MPS if one allows the 
bond dimension to grow polynomially with the size of the 
system (while MPS are usually required to have a con- 
stant bond dimension). Thus, while MPS manipulations 
typically have a computational cost linear in the number 
of particles, 1D-MERA manipulations have a computa- 
tional cost which is superlinear (but yet polynomial). 

Beyond MPS and MERA, one could consider states 
obtained from a quantum circuit where the positions of 



the gates are known and try to identify those gates. An 
interesting question is then to characterize what topol- 
ogy of circuits makes it possible to learn gates efficiently. 
This could lead to formal methods for the testing and 
verification of quantum hardware. 
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Appendix A: Error analysis without post-selection 

The modified scheme that circumvents the need of 
unitary control modifies the error propagation. Indeed, 
the scaling of the overall error increases since the er- 
ror at each step will depend on previous errors. Es- 
sentially, to find the optimal disentangler, the algorithm 
will not receive the perfect state \n k ) (see eq. Q) but 

the state j^\r, k ) { Vk \ + j^\E?») {E%»\ where \Eg") 

1 k k 

is a subnormalised error vector resulting from the ac- 
cumulation of all previous errors whose square norm is 
E^ 71 = (E^lE^J 71 ). Thus, the numerical minimisation 
returns a unitary that is not the perfect disentangler. 

In the degenerate case — when there are many uni- 
taries reaching roughly the same minimum value of the 
objective function — , this might change the disentangling 
unitary drastically. Indeed, we note that the existence of 
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degenerate local minima affects both of our tomography 
methods, the one with and the one without unitary con- 
trol of the system. In such degenerate cases, exploring 
many local minima by selecting random initial guesses 
could get around the problem. However, it is likely that 
these instances are intrinsically hard and that our al- 
gorithm does not converge to the right answer in those 
cases, c.f. the atypical data points on Fig. [3] 

In the non-degenerate case, we can argue that the ac- 
cumulation of errors would be quadratic in the number 
of particles. We proceed in three steps. First, we ana- 
lyze how the modification of the input state will affect the 
disentangling unitary returned by the algorithm. Second, 
we evaluate how this imperfect disentangler impacts the 
error propagation. Third, we bound the error to show 
the quadratic scaling. 

Our algorithm returns the unitary U = e lH that min- 
imizes the objective function (|3l for a given state p. If 
we don't post-select on the ancillary particles being dis- 
entangled, this minimization is not performed on the 
the perfect state po = \f]k)(Vk\ but rather on the state 
{l-e)p a +e<7 where e = j^j^ and a = \E^ n )(E 



l m \. We 

want to know how much U ~ arg minj/ f(U, p) changes 
when p changes. Using the chain rule, we formally write 

i 'Wp ~ ^ e ^ rs ^ * erm quantifies how much U 

changes when the objective function changes for a given 
p. In the non-degenerate case, we expect this term to 
be bounded in norm by a Lipschitz constant 77. The sec- 
ond term evaluates how the objective function changes 
when going from po to (1 — e)po + ea. Recalling that 
the objective function is a sum of eigenvalues and using 
non-degenerate perturbation theory, this term is going 
to be proportional to e. Thus, instead of U — e lH . the 
minimization algorithm returns e l ( H + er i A ) ~ WU where 
W = e ier,A . 



As a consequence, eq. ( 10 ) is modified to read 
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Vk+i) 



l^ c Ti) 



V 1 + E k+i 



( A1 ) 

Taking into account the anomalous unitary W , we get 
1 + E^ = l + ei™ 1 //3 2 = (l + e k+1 )(l + En/P 2 (A2) 

where /3 2 = \(vk+i\(0\^ k+1 W\0) tsk+1 \r lk+1 )\ 2 = \(W)\ 2 . 
Using W — e levA , calculations show that 



2 



1 



-V ((A 2 ) - (A) 2 ) = 1 - e 2 r] 2 A 



2^2 A 2 



(A3) 



where the variance A 2 of A with respect to state 
|0) ® k+1 |?7fc+i) appears. Recalling that e = fxgem > one 
gets 



> 



1 



for any E^ m if ?7 2 A 2 < 1 or for small E% m otherwise 



(A4) 



Thus, the error magnitude E^ 1 obeys the recurrence 
relation 

1 + ^ (l + ^ cm ) 3 (l + ^+i) 

< (l + ei ) 3fc ...(l + e fc+1 ). 

Assuming that the error at each step is bounded e k < e, 
the total error scales as 



E m < -m e. 



(A5) 



Appendix B: Comparing a reconstructed MERA to 
a predicted MERA 

In this section, we describe a polynomial algorithm to 
contract two MERA states, thus allowing to compute 
their fidelity. This algorithm is of practical interest for 
comparing a MERA whose parameters have been iden- 
tified experimentally using our method to a predicted 
MERA state -found by numerical optimisation for in- 
stance. Notice that contracting two different MERA 
states also allows to compute expectation values of ten- 
sor product of local observables Ai since it suffices 
to contract the original state \ip) and the modified state 
\4>) = Ai\tp). which is also a MERA state. 

Defining a method to contract two MERA states is 
equivalent to giving a prescription on how to sequen- 
tially contract the tensor network resulting from joining 
two MERA states. Recall that contracting two tensors 
( M )ia] b and ( N )k b £ c to obtain T ialc = J2j b M iah N Mc 
has a computational cost of a x b x c where a is the num- 
ber of values that the index i a can take b and c are defined 
in the same way with respect to jb and i c . In a tensor 
network, every tensor is usually represented with a num- 
ber of bonds that each represent an index that has the 
same maximal number of possible values. For a MERA, 
this maximal bond dimension is usually denoted by \- 

The main idea to contract efficiently two MERA states 
is essentially to turn them into two MPS before contract- 
ing them. We look at the MERA circuit as having n/2 
columns of gates vertically and log x n — 1 renormalisa- 
tion layers horizontally. The sequence of contraction is 
to sequentially contract every tensor in the leftmost col- 
umn to create a tensor with a large number of bonds that 
will then contract with every tensor in the next column. 
The maximal number of bonds that this leftmost tensor 
will have throughout the contraction of the network is 
given by the maximal number of bonds that are opened 
when taking a vertical cut in the tensor network. For 
a single MERA, cutting through each of the log x n — 1 
layer opens up two bonds, one for the righmost incoming 
edge of the isometry and one for the outgoing edge of the 
isometry. Thus, for the contraction of two MERAs, the 
maximum number of bonds for a vertical cut is bounded 
by max # = 2 x 2 x log x n — 4 log x n, which is verified nu- 
merically (see top of Fig. [7]). Since at every contraction 
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step, the leftmost tensor with a large number of bonds 
contract with another tensor that has at most two bonds 
in addition to the ones being contracted, the maximum 
cost of one contraction is x max *X 2 = X 2 ™ 4 - Finally, 
there are 0(n) disentanglers and isometries to contract 
so the total cost of contracting the network is bounded 
by 0(n 5 ). Actual numerical simulations show that this 
bound is probably not tight (see bottom of Fig. [7]). 



Figure 7: (top) Maximum number of bonds during the con- 
traction procedure as a function of the logarithm of the num- 
ber of qubits n. Numerical results (solid blue line) are con- 
sistent with the expected bound of 4 log 2 n. (bottom) Con- 
traction cost C as a function of the number n of qubits on 
a logscale. Numerical results (solid blue line) are consistent 
with the 0(n 5 ) bound (dot dashed green line) but linear ap- 
proximation (dashed red line) indicate that the cost scales like 
a smaller power of n, namely C ~ n 4 ' 3 . 



